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Abstract 

A nonlinear optimization algorithm for linear predictive speech coding was devel- 
oped early that not only optimizes the linear model coefficients for the open loop predic- 
tor, but does the optimization including the effects of quantization of the transmitted 
residual. It also simultaneously optimizes the quantization levels used for each speech 
segment. In this paper, we present an improved method for initialization of this nonlin- 
ear algorithm, and demonstrate substantial improvements in performance. In addition, 
the new procedure produces monotonically improving speech quality with increasing 
numbers of bits used in the transmitted error residual. Examples of speech encoding 
and decoding are given for 8 speech segments and signal to noise levels as high as 47 
dB are produced. As in typical linear predictive coding, the optimization is done on 
the open loop speech analysis model. Here we demonstrate that minimizing the error 
of the closed loop speech reconstruction, instead of the simpler open loop optimization, 
is likely to produce negligible improvement in speech quality. The examples suggest 
that the algorithm here is close to giving the best performance obtainable from a linear 
model, for the chosen order with the chosen number of bits for the codebook. 

1 Introduction 

Linear prediction speech coding (LPC) techniques were first used for speech analysis and 
synthesis by Itakura and Saito [1], and Atal and Schroeder[2]. Conventional LPC requires 
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two computational steps which are coefficient estimation of an all-pole model and quantiza- 
tion of the prediction residual [3,4]. Typically, the model is developed or optimized without 
regard for the fact that the residual will be quantized before it is transmitted to a receiver 
for reconstruction, and in addition the quantization is not optimized with respect to each 
speech segment transmitted. 

An algorithm was introduced in [5] which starts from the basic LPC framework, but 
optimizes the coefficients of the model taking into account the fact that the transmitted 
error residual is simultaneously quantized into a specified number of levels. In other words 
the coefficients are optimized with knowledge of precisely what information will be made 
available for the speech synthesis process. The algorithm simultaneously optimizes the levels 
chosen for each speech segment rather than using some a priori choice. The fact that this 
algorithm supplies these two extra aspects to the usual open loop optimization suggests 
that better performance is achievable by comparison to typical LPC approaches. It is the 
purpose of this paper to present an improved initialization procedure for the algorithm of 
[5]. The optimization involved in the algorithm is nonlinear, and hence it can converge to 
a local minimum, and fail to realize the full potential. Hence, having good initialization for 
the optimization can substantially improve performance, and this is demonstrated here. 

Although the algorithms in [5] and in this paper build on the LPC framework, historically 
they were developed after observing the attempt to use blind equalization in speech encoding 
in [6]. Reference [6] uses just two quantization levels for the error residual. In blind 
equalization of a corrupted binary bit stream, decisions are made each time step about which 
of the two possible bits was sent. The procedure is “blind “ in the sense that it does not know 
what the input sequence was. If the corruption is not too large the decision process results in 
making the output equal (or “equalized “) to the input bit stream. It is conceivable that when 
one uses only two quantization levels in the transmitted error residual in speech encoding, 
a similar binary decision could be made in the speech reconstruction or synthesis step, and 
this would then avoid the need to transmit the error residual. Numerical experience gave 
poor results using blind equalization in the closed loop reconstruction necessary for speech 
encoding, and hence [6] only treats open loop prediction. Here we do not attempt to use 
blind equalization. We transmit the information necessary for the reconstruction of the 
residual. The one aspect of the present algorithm in common with [6] and not part of 
typical LPC, is that the LPC coefficients are optimized with knowledge that the residual is 
quantized. This time we allow an arbitrary number of quantization levels (among powers 
of two) rather than just two levels, and furthermore we let the levels be optimized for each 
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speech segment. 


2 Basic Concepts in Linear Predictive Speech Encoding 

Here we summarize some basic formulation for LPC as a framework for later discussion 
[3,4]. Let x(k) , k = 1, 2, . . . , N be the sampled time history of a segment of speech signal 
(denote the segment by S ). Then typical encoding, transmission, and decoding steps are 
as follows. 

2.1 Encoding: 

The encoding or speech analysis uses an open loop prediction x 0 (k) satisfying 

x 0 (k) — — a\x{k — 1) — a 2 x(k — 2) — • ■ • — a n x(k — n) (1) 

where the coefficients a* are chosen to make the open loop prediction error € 0 (k) minimize 
the optimization criterion 

^ = E e o( fc ) (2) 

5 

where 

e 0 {k) = x(k) - x 0 (k) (3) 

Note that by substituting Eq. (1) into Eq. (3), the speech sequence x(k) exactly satisfies 
the finite-difference model 

x(k) + a\x(k - 1) + c*2^(fc - 2) + f a n x(k - n) = e 0 (k) (4) 

By choosing the ai to minimize the equation error in Eq. (4), one minimizes the one step 
ahead prediction error, i.e. the open loop prediction error. The sequence of values of the 
input e 0 (k) are now quantized in some way to represent e 0 (k) by an approximate signal e 0 (fc), 
requiring fewer number of bits to transmit than the full number in x(k ). This accomplishes 
compression of the signal. 

2.2 Transmission: 

The values of the ai and initial conditions of x(k) for n time steps are transmitted, and 
the sequence of e Q (k) for all time steps are transmitted in some form. For an appropriately 
chosen order n, the left hand side of Eq. (4) captures the majority of the signal, so the error 
in the finite-difference representation, e G (fc), should be substantially smaller than the signal 
x(k) itself. This indicates that using fewer bits to form e Q (k) need not result in degraded 
quality in the reconstructed signal. 
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2.3 Decoding: 

In the speech synthesis step, the signal is reconstructed by the receiver, using the closed 
loop formula 

x c (k) = -aix c (k - 1) - 02 x c (k - 2) a n x c (k — n) + e 0 (k) (5) 

starting with the transmitted initial conditions x c (k ) = x(k). Comparing to equation (4), 
the only error in this reconstruction is the quantization used in the transmitted values of 

e 0 (k). 

By using the open loop equation for encoding one obtains a relatively simple linear 
problem to find the coefficients a,-. Since the reconstruction is necessarily closed loop 
because the receiver does not know the previous n values of x(k), it would yield better 
reconstructed values if the encoding optimization was done for the closed loop prediction 
equation, but this is a nonlinear optimization problem which is substantially more difficult 
to solve. 

3 Encoding Scheme 

In [5], an encoding scheme is introduced which makes the choice of the quantization levels 
for e 0 (k) part of the optimization. The coefficients a t are optimized simultaneously with 
the choice of these levels. 

3.1 Codebook: 

The input e 0 (k) is constrained to be a linear combination of the entries in the vectors of a 
binary' codebook. To form a codebook, first pick the number of bits r to be used. Then 
form the column vectors of the codebook as all possible vectors of length r with each entry 
either +1 or —1. For example, for r = 4 there are 16 vectors in the codebook. Denote the 
/tli entry of the jth vector in the codebook as Sji . 

3.2 Encoding: 

The encoding in Eq. (1) is modified as follows for the jth codebook entry 

x 0 {k,j{k)) = -a x x(k - 1) - a 2 x(k - 2) a n x(k - n) + u(j(k)) (6) 

where the forcing function is taken as a linear combination 

u(j{k)) = Pl$j] + 02&j2 + f Prtijr (7) 
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of the jth codebook vector entries. The objective is then to determine constant values for 
ai and $ for all time steps of the speech segment , and determine codebook entries j(k) for 
every time step k, in order to achieve the following minimization 


Jo = 


N 


min 


^\ N - k e 2 0 (kJ(k)) 

k= 1 


Co{k,j(k)) = x{k) - Xo(k,j{k)) 


( 8 ) 

(9) 


The A is a positive number less than or equal to one, representing a forgetting factor. 

To accomplish this minimization, Ref. [5] formulates the recursive least squares equations 
for finding the values of the coefficients and fit that minimize the weighted (by A ) 
Euclidean norm of the equation errors for all k and any choice of j for the equation 


x(k) + a\x(k — 1) + 02 x(k — 2) H {- a n x(k — n) = + lh&j 2 H V f3 r 8 jr (10) 

As noted earlier for LPC, this process minimizes the (weighted) open loop prediction error 
of Eq. (6). Such a recursive computation produces running estimates dj(fc), fit (k). The 
desired solutions for these coefficients minimizing the least squares error are obtained when 
k reaches N. However, Ref. [5] also incorporates the choice of j in this running estimation, 
picking its value each time step to minimize the current estimation error before progressing 
to the next step. The result is that for sufficiently long data sets, the recursively updated 
values of cq(fc), 0i(k) converge to constant values along with a computed set of j(k ) for the 
speech block. The value of A can be adjusted to influence the number of data points needed 
to reach constant values. 


3.3 Transmission: 

The transmission of the coded signal can be done by sending the final minimizing values 
for ctj and , the initial conditions, and the code vector entry number j(k) identifying 
the minimizing code vector for each time step. Since the choice of code vector typically 
will not change every time step, one can compress the amount of data further by simply 
transmitting changes in the code vector when they occur. 

3.4 Reconstruction: 

The speech synthesis uses the transmitted information to determine u(j(k )) according to 
Eq. (7), and recursively computes 

x c {k) = —OL\x c {k - 1) - a 2 x c (k - 2) a n x c (k - n) + u(j{k)) (11) 

starting by using the transmitted initial values of x(k) for the initial conditions on x c (k) . 
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3.5 Initialization: 


The initialization for the minimization process starts with the choice of the number of 
codebook entries, i.e. the number of bits r, and then needs initial guesses for the coefficients 
d, (0), /?,(0) and an initial value for the covariance function P(0) in the least squares update 
formula. As is typically done in recursive least squares, Ref. [5] sets the dj(0), $i(0) to zero, 
and P(0) to be a large number ([5] uses 100,000 in its examples) times the identity matrix 
of appropriate dimension. 

The set of possible values of u(j(k )) achievable are given by picking all possible signs in 
u(j(k)) = (±)i/?i + (i)‘2/?2 + • • • + (±) r /3r » producing 2 r levels. The optimization achieved 
here differs from that in LPC because the discretization levels are now optimized for each 
speech block, and in addition, the coefficients o, arc optimized with knowledge of these 
levels. Hence, for a given number of quantization levels, if a global minimum is achieved in 
Eq. (8), then the method of Ref. [5] would necessarily out perform typical LPC with the 
same number of levels. The problem addressed here is a nonlinear problem, and hence it is 
possible to converge to a local minimum. Whether or not one reaches a good minimum can 
depend on the starting conditions in the minimization process, i.e. the initialization. The 
objective of this paper is to present improved starting conditions for the algorithm, and to 
demonstrate the resulting improved error levels upon convergence. 

4 Improved Starting Conditions 

Instead of starting with the desired bit number and performing the optimization, we first 
optimize for bit number r = 1, and use the results to optimize bit number r — 2 , continuing 
until the desired bit number (or speech quality) is reached. 

For bit number r — 1, we need initial values for o, and /?] . as well as the initial value 
for the (n + 1) x (ra + 1) dimensional covariance matrix P. The quantity /?i(0) = 0 is used, 
but dj(0) are estimated by minimizing the sum of the squares of the e 0 (k) in Eq. (4) over 
the speech segment. Just as in LPC, it is desirable to have the left hand side capture as 
much of the behavior of the signal as possible, leaving as little as possible for the u(j(k )) 
to capture its resulting residual. Write Eq. (4) in matrix form including each time step of 
the N length speech segment, and using the first n points as the initial conditions 

x = —Aa + e (12) 
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where 


x = 
€ = 
a = 


x(n + 1) 
e 0 {n + 1) 

Ot i C*2 


x(n 4- 2) 
e 0 (n + 2) 

- a n f 


A = 


x (n) x(n — 1) 

x(n+l) x(n) 


x(N) } T 

e o{N) f 

x(l) 

x(2) 


(13) 


[ x(A r — 1) x(N — 2) x(N — n) J 

Then the value of a that minimizes c T e , i.e. the desired starting values a( 0) , satisfies 
A t x = (A T .4)a(0) which can be rewritten as 


d(0) = -P 0 X (14) 

where 

X = A X = | Cn y 71+ 1 Cn— l,n+l ' * * C*l,n+1 j 

Cn,n C^n^—l ’ * * Cn,l 

t Cn— l,n Cy?— l,n— 2 ’ * ’ Cn— 1,1 

= ; ; ; (15) 

Ci fU Ci,.-! ••• Ci,! 

and superscript f indicates the inverse, or Moore-Penrose pseudo inverse if appropriate. 
The Cij represents the correlation between the values of the data sequence x(k) and the 
sequence shifted by i — j time steps. Thus, Po is the inverse (or pseudo inverse) of the data 
correlation matrix. 

The weighted recursive least squares algorithm is a recursive version of a least squares 
equation like Eq. (14) but including the (3i and a forgetting factor. It computes the change 
needed in the coefficient estimates each time a new data point is added to the data set. Part 
of the recursive formula is a recursive version of the matrix Pq = (A T Ay above, generalized 
to include the $ terms and denoted by P. For bit number r — 1, the P(0) of the recursive 
formula is the inverse of the correlation matrix for a T j3\ J , and hence we use Pq from 
Eq. (15) for the upper left n x n partition, and need to assign values for one more row 
and column. All these new elements are set to zero, except for the final diagonal element 
associated with knowledge of ft\ which is chosen as 10 6 . Such a large number represents 
essentially no a priori knowledge about this coefficient. 

Once the solution for bit number r — 1 is obtained, then we progress to bit number 
r — 2, etc. In general, when going from r to r + 1 for any r, the inital values are set as 
follows: 
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1. The final values with bit number r, obtained for the coefficients a* after running the 
recursive least squares until stabilized values are obtained, are used as starting values 
A(0) for the new problem using r + 1 bits. 

2. The corresponding procedure is also used for the r initial values for /?i, (i 2, ■ 

The initial value for j 3 r + 1 is set to zero. 

3. The (n + r + 1) x (n + r + 1) dimensional P(0) for the problem using r + 1 bits takes 
the form of a block diagonal matrix 

P(0) = diag(Pn(0), P22(0), ^33(0)) (16) 

4. The Pi i(0) for the problem with bit number r + 1 is of dimension n x n , and is taken 
as the final value of the upper left n x n partition of the (n + r) x (n + r) dimensional 
matrix P for bit number r. after finishing the recursive computation. 

5. The P22 (0) for bit number r + 1 is of dimension r x r , and is the product of r x r 
identity matrix and the norm, or maximum singular value, of Pii(0) . 

6. The P33(0) for bit number r + 1 is a scalar set to 10 6 . 

This procedure for initializing makes full use of available information for the cq . The 
initialization for the 0 i is somewhat ad hoc, and is made wit h the following considerations 
in mind. Numerical experiments showed that using the full (n + r) x (n + r) final matrix 
P for bit level r, in place of the first two partiitions of the block diagonal P(0) for the 
next bit number, results in rather small adjustments of the model coefficients in the next 
level, and in corresponding small improvements in speech quality with each bit number. 
On the other hand, replacing the P22(0) of item 5 by 10 6 times the r x r identity matrix, 
i.e. using essentially no a priori information about the first r coefficients among the did 
not achieve good results either. It appears to converge to a local minimum solution with 
poor speech quality. The choice described above allows these r coefficients to be adjusted 
about as much as the Oj’s, and this appears to be a good compromise. There is no a priori 
information on the remaining coefficient, f} r +i, and using 10 6 leaves it totally free to be 
adjusted. 

5 Performance of the Modified Algorithm 

Eight speech segments from two speakers are used to demonstrate the performance of the 
modified algorithm. The first four are from a female speaker, and correspond to the words: 
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The pipe / be-gan / to rust / while new. The remaining four are from a male speaker 
saying: Oak is / strong / and also / gives shade. The lengths of these eight segments are 
3100,3550,4720,6650,4300,3700,4500,5450 data points, respectively. The length of the 
filter is chosen to be n = 10 which is a commonly used order for LPC speech modeling. The 
forgetting factor is set to A = 0.999 . 

Two measures of the speech quality of the reconstructed signal are considered, the 
Euclidean norm of the error, err, and the signal to noise ratio, SNR , i.e. the norm of the 
signal divided by the norm of the error, in dB 

||err|| = []£*(*(*) - ^c(^)) 2 ] ^ 

SNR = 201og(||:r|| / ||err||); ||x|| = [£, * 2 (fc)] ^ (17) 

Tables 1 and 2 give these measures for the algorithm of Ref. [5] used on the eight 
speech segments, for bit numbers ranging from r = 1 to 10. To evaluate the amount of 
compression obtained at each bit level, we comment that the unencoded signal uses 16 bits. 
The SNR’s for 10 bits tend to be in the range from 8 to 11 dB. The SNR tends to saturate 
as the bit number increases, with only small improvements obtained with increasing the 
bit number beyond 4 or 5. However, an important property is that the speech quality does 
not necessarily improve each time the number of bits is increased. This property would not 
occur if we were able to obtain a global minimum each time. 

Tables 3 and 4 give the corresponding results using the modified algorithm with the 
improved starting conditions. The average of the SNR’s with bit number r = 10 for the 
female speaker is 35 dB, and for the male speaker is 28 dB, which represents a very substan- 
tial improvement. By making use of the results for bit number r to start the algorithm for 
bit number r + 1, the resulting SNRs now exhibit monotonic improvement with increasing 
bit number. There appears to be a relationship between how good the bit number 1 result 
is, and how good higher bit numbers are. For example, segment number 3 starts with the 
highest SNR at bit number one, and for bit number 10 it is still the highest with an im- 
pressive SNR of 47 dB. Similarly, segment number 8 starts with the lowest SNR and ends 
with the lowest for bit number 10. 

The use of the result from the previous bit number makes the computation for the next 
bit number take less time than starting from the initialization for that bit number used in 
Ref. [5]. In the case of speech segment 4 which is the longest segment, the solutions for bit 
levels r — 4 through 7 took about 48% less time than using [5], and for bit levels r = 8,9, 
and 10 it took 43%, 34%, and 27% less time, respectively. However, to get the initialization 
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for a given bit number we need to run all lower bit numbers first, and this means that using 
the new initialization take somewhat longer. For segment 4, the total computation time for 
bit level 4 takes approximately twice as long as in Ref. [5], and for bit level 7 somewhat less 
than three times as long. For this extra computation time the signal to noise ratios improve 
from 8.9 and 9.5 to 16 and 27 dB respectively. These computation times using code written 
in Matlab and run on a work station are near real time. 

The mean opinion score (MOS) is the most commonly used measure for the subjective 
quality of coded speech. It is extracted from the results of a category-rated test performed 
by 20 to 60 untrained listeners. Reference [3] describes a curve fitting procedure used 
to convert MOS to equivalent Q values (EQ), or dB levels which we can compare to our 
SNRs. The dB values are categorized in increments of 5 dB starting from 5 dB (bad) to 
35 dB (good). Table 5 reproduced from [3] gives such evaluations for some existing coders. 
The flat condition in the table refers to unfiltered speech recorded with a high quality 
microphone, and the IRS condition refers to speech filtered through an IRS transmitting 
filter, such as speech that would be recorded from a typical telephone handset. The line 
labeled “source “ represents the error between the original signal and the signal using 16 bits 
which is then used for the encoding. Among the coding methods listed, the conventional 
LDCELP employs a 10-bit codebook with a 50th order LPC predictor and a 10th order 
adaptive linear predictor. VSELP uses two 7 bit codebooks and a long term filter state, 
which is also a 7 bit codebook (together requiring 14 bits for index delivery), with a 10th 
order LPC predictor to carry out speech coding. Together this requires 14 bits for index 
delivery, so that for comparison purposes one must compare to the performance using a 14 
bit code book in the method presented here (beyond the last entry for 10 bits in our table). 
Table 5 gives a rough understanding of what we might expect if MOS tests were run on 
the current method, and it is clear that the present method is competitive. However, true 
MOS tests under uniform testing conditions for each vocoder (voice encoder) are needed to 
actually determine the potential performance advantages of the new method. 

As in LPC, the information transmitted in the vocoder proposed here is optimized for 
reconstruction using a open loop predictor, but the receiver necessarily reconstructs with 
a closed loop predictor. It is of interest to see how much signal is lost in the open loop 
encoding and how much is lost in the closed loop reconstruction. This information is given 
in Tables 6 through 9. The column labeled SNRc is the signal to noise ratio given previously 
for the reconstructed signal using the closed loop formula (11), and SNRo is the signal to 
noise ratio of the open loop prediction of equation (6). The third column gives the percent 
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of signal to noise ratio of SNRc compared to SNRo. The best that the reconstruction 
could possibly do is to reproduce the open loop encoding, which corresponds to 100%. A 
smaller percentage indicates the amount of SNR lost by going from open to closed loop for 
performing the speech reconstruction. By bit number 10 the amount of SNR lost is about 
one fourth, with the percentages ranging from 68.9% to 78.2% for the 8 speech segments. 
Again, the speech segment with the best percentage for bit 1, has the best percentage for 
bit 10. 

6 Potential Improvement with Closed Loop Optimization 

Wliat matters in any vocoder is the quality of the reconstructed speech. LPC optimizes the 
quality of the speech encoded with the open loop equation (6) because this optimization is 
relatively simple, and the same is done here. Presumably, improved open loop encoding is 
reflected in improved closed loop reconstruction. In this section we address the question of 
how much improvement might be obtainable if we optimized the error in the reconstruction. 
This means that we replace Eqs. (2) and (3) by 

Jc = X> c 2 (fc) ( 18 ) 

e c (k) = x(k) - x c (kj(k)) (19) 

x c {kj{k)) = -aiX c (k-lJ(k-l))-a 2 x c (k-2, j{k~2)) a n x c (k-nj(k-n))+u(j(k)) 

( 20 ) 

with the closed loop output x c (fc) of Eq. (11) substituted, and then develop an algorithm to 
minimize Eq. (18) over the a*, /%, and j(k). In order to minimize J c , we develop a nonlinear 
least squares algorithm using analytical gradient and Hessian information, and setting any 
negative eigenvalues to zero for that portion of the Hessian that comes from the second 
derivative terms [7]. These iterations are started for each bit level using the results of the 
vocoder developed here. Thus, the nonlinear least squares algorithm of this section could 
be made the second part of the total speech algorithm, aiming to reach speech encoding 
whose reconstruction is the best possible for the chosen model order and bit number. 

Table 10 gives the results of this optimization. For bit number 10 the amount of im- 
provement over Tables 3 and 4 is always less than 1 dB, and often substantially less. Thus, 
we conclude that the extra complexity in optimizing the reconstructed speech signal error 
as an extra step after optimizing the open loop encoding, is not justified. Of course opti- 
mizing the reconstructed speech signal is a nonlinear optimization. There is no way to know 


11 



whether we have found the global minimum by use of the nonlinear least squares algorithm 
here, initialized from the open loop optimization results. Nevertheless, the consistency of 
all of these results for the 8 speech segments suggests that there is only a very small amount 
of improvement available by doing the closed loop optimization in place of the open loop. 
This suggests that the vocoder developed here easily captures essentially all of the potential 
speech quality available by the chosen filter order and bit number (or codebook vectors). 

7 Conclusions 

Here we have developed an initialization process for the vocoder developed earlier that very 
substantially improves its performance. It also consistently gives improved performance 
when the number of bits used is increased. Although we optimize the open loop predictor 
as does LPC, the amount of improvement is quite small that could be obtained by actually 
directly optimizing the closed loop reconstructed speech signal quality. It is sufficiently small 
that any significant extra computational effort would not be justified. Rough comparisons 
indicate that the proposed vocoder performance could be competitive. The next step is 
to actually evaluate the potential performance advantages using MOS tests comparing to 
existing methods. 
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seg# 1 

seg# 2 

seg# 3 

seg# 4 

bit # 

||er*r|| 

SNR 

||err|| 

SNR 

||err|| 

SNR 

||err|| 

SNR 

i 

10.6900 

2.1020 

10.2170 

3.1198 

10.2260 

3.0743 

10.0250 

2.5206 

2 

7.5088 

5.1704 

7.2539 

6.0948 

6.9625 

6.4135 

6.0034 

6.9742 

3 

6.1292 

6.9338 

6.2971 

7.3234 

5.5346 

8.4070 

5.0588 

8.4613 

4 

5.7824 

7.4397 

6.4906 

7.0606 

4.9370 

9.3995 

4.8093 

8.9005 

5 

5.6382 

7.6591 

6.1148 

7.5786 

5.2342 

8.8918 

4.5001 

9.4777 

6 

5.5952 

7.7256 

7.8297 

5.4313 

4.8731 

9.5128 

4.4749 

9.5266 

7 

5.6201 

7.6870 

6.5351 

7.0012 

4.6804 

9.8631 

4.4726 

9.5310 
9.7606 ' 

8 

6.1560 

6.8958 

8.6430 

4.5730 

4.9920 

9.3034 

4.3559 

9 

6.1446 

6.9119 

7.1184 

6.2586 

4.7546 

9.7266 

4.4212 

9.6313 

10 

5.4031 

8.0289 ‘ 

5.3697 

8.7072 

4.6582 

9.9045 

4.3060 

9.8608 


Table 1: The Euclidean norm and the signal to noise ration for segments #1, #2, #3, and 
#4 using the original initialization in [5]. 
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seg# 5 

seg#6 

seg# 7 

seg# 8 

bit# 

||err|| 

SNR 

\\err\\ 

SNR 

||err|| 

SNR 

lien’ll 

SNR 

1 

9.2846 

2.2080 

10.5040 

2.5518 

8.8284 

2.5670 

8.9596 

1.0189 

2 

5.7972 

6.2990 

7.6989 

5.2505 

6.2653 

4.9202 

7.6451 

6.8906 

3.2996 

3 

4.5242 

8.4525 

6.8499 

3.5340 

10.5190 

6.4524 

3.8702 

4 

4.0774 

9.3557 

6.5148 

6.7010 

3.3269 

11.0439 

6.3875 

3.9580 

5 

4.0296 

9.4580 

6.0687 

7.3171 

3.2257 

11.3123 

9.8403 

0.2045 

6 

3.9711 

9.5852 

6.3260 

6.9565 

2.8116 

12.5057 

9.7462 

0.2879 

7 

3.8738 

9.8006 

5.9694 

7.4604 

2.7830 

12.5944 

7.5068 

2.5556 

8 

3.9596 

9.6104 

6.2548 

7.0548 

2.7642 

12.6532 

8.1778 

1.8120 

9 

3.8545 

9.8440 

5.8460 

7.6419 

2.7472 

12.7067 

7.2672 

2.8373 

10 

3.5071 

10.6643 

4.6599 

”976115 

2.7996 

12.5428 

10.1554 

-0.0693 


Table 2: The Euclidean norm and the signal to noise ratio for segments #5, #6, #7, and 
#8 using the original initialization in [5j. 



seg# 1 

seg# 2 

seg# 3 

seg# 4 

bit# 

Herr || 

SNR 

il err H 

SNR 

||err|| 

SNR 

||err|| 

SNR 

1 

10.5016 

2.2567 

12.6157 

2.9729 

10.7945 

3.1509 

13.2577 

2.6584 

2 

6.0716 

7.0158 

7.5711 

7.4079 

7.8669 

5.8988 

7.8404 

7.2209 

3 

3.5933 

11.5721 

4.4410 

12.0415 

4.3603 

11.0247 

4.7244 

11.6208 

4 

2.3918 

15.1072 

2.9581 

15.5708 

2.2975 

16.5897 

2.8488 

16.0145 

5 

1.6356 

18.4081 

1.9893 

19.0172 

1.3518 

21.1970 

1.8872 

19.5915 

6 

1.2033 

21.0746 

1.3757 

22.2205 

0.7285 

26.5661 

1.1658 

23.7752 

7 

0.9633 

23.0066 

0.9821 u 

25.1480 

0.3593 

32.7065 

0.8148 

26.8870 

8 

0.8184 

24.4221 

0.7491 

27.5006 

0.2071 

37.4924 

0.5637 

30.0868 

”9 

0.6744 

26.1030 

0.6171 

29.1840 

0.1265 

41.7704 

0.4312 

32.4149 

10 

0.5545 

27.8036 

0.5056 

30.9153 

0.0717 

46.7031 

0.3596 

33.9909 


Table 3: The Euclidean norm of the error and the signal to noise ratio for segments #1, 
#2, #3, and #4 using the new initialization procedure. 
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.se<7#5 

seg#6 

seg#7 

scg# 8 

bit # 

||err|| 

SNR 

||err|| 


illBaSli 

SNR 

||err|| 


i 


2.5381 

10.4803 

2.5715 

9.8475 

2.6128 

11.7792 


2 

5.2222 

8.1861 


7.1206 

5.6635 

7.4177 

7.9410 




12.3137 

4.3628 

10.1836 

3.4512 

11.7199 


7.4338 


n ■ 

16.0508 


14.4147 

2.1235 

15.9383 

3.9910 

mi 

5 

1.7588 

17.6387 

1.6751 

18.4982 

1.3870 

19.6379 


14.1309 

6 

1.4205 

19.4940 


22.3223 

1.0235 

22.2772 


16.5717 

7 

1.1707 

21.1743 

■ 

26.1559 

0.7594 

24.8695 


18.5785 

IBB 

0.9815 

22.7050 

| 

29.1693 

0.6025 

26.8805 


20.6735 

!■ 

uni 

Em 

|jira| 

31.8814 

0.5171 

28.2075 


22.7978 

Bi 

■!U!££| 

E2B9 

0.2852 

33.8777 

0.4622 , 

29.1828 

0.7898 

24.9122 


Table 4: The Euclidean norm of the error and the signal to noise ratio for segments #5, 
#6, #7, and #8 using the new initialization procedure. 



IRS 

Flat 

Vocoder Type 

kb/s 

MOS 

EQ 

MOS 

EQ 

G.726(ADPCM) 

32 

3.77 

27.87 

3.70 

35.00 

G.728(LDCELP) 

16 

3.88 

30.38 

3.77 

35.00 

GSM(RPE-LTP) 

13 

3.63 

25.58 

3.56 

33.25 

IS54(VSELP) 

8 

3.49 

23.79 

3.47 

31.89 

source 

128 

4.10 

35.00 

4.03 

35.00 


Table 5: MOS test results for several existing vocoder types [3] 



seg# 1 

seg# 2 

bit# 

SNR 0 

SNR C 

% 

SNR 0 

SNR C 

% 

1 

11.7041 

2.2567 

19.2816 

14.1828 

2.9729 

20.9609 

2 

16.2364 

7.0158 

43.2104 

18.1943 

7.4079 

40.7158 

Ebb 

20.2139 

11.5721 

57.2481 

22.0268 

12.0415 

54.6675 


24.0978 

15.1072 

62.6911 

25.3486 

15.5708 

61.4267 


27.1619 

18.4081 

67.7717 

28.5015 

19.0172 

66.7236 


in 

21.0746 

69.8391 

31.6454 

22.2205 

70.2171 

K^B 


23.0066 

70.9875 

34.3546 

25.1480 

73.2013 


34.3550 

24.4221 

71.0875 

36.7348 

27.5006 

74.8627 



26.1030 

72.4464 

3843917 

29.1840 

75.4270 


37.6211 

27.8036 

73.9041 

40.4305 

30.9153 

76.4654 


Table 6: The signal to noise ratios for the open loop encoding and for the closed loop 
reconstructed signal, and the ratio of the latter to the former given in percent. Speech 
segments #1 and #2. 
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seg# 4 

bit# 

SNR 0 

SNR C 

% 

SNR 0 

SNR C 

% 

1 

14.7564 

3.1509 

21.3526 

15.1423 

2.6584 

17.5558 

2 

18.7843 

5.8988 

31.4029 

19.1231 

7.2209 

37.7602 

3 

23.5851 

11.0247 

46.7443 

23.3566 

11.6208 

49.7536 

4 

28.8222 

16.5897 

57.5588 

27.3273 

16.0145 

58.6025 

5 

33.8082 

21.1970 

62.6978 

31.4017 

19.5915 

62.3898 

~6 

39.2141 

26.5661 

67.7462 

35.1828 

23.7752 

67.5762 

7 

45.2169 

32.7065 

72.3325 

38.9523 ” 

26.8870 

69.0254 

8 

50.0510 

37.4924 

74.9084 

41.9132 

30.0868 

71.7838 

9 

54.7252 

41.7704 

76.3276 

44.2857 

32.4149 

73.1949 

10 

59.7176 

46.7031 

“78.2065 

46.0070 

33.9909 

73.8820 


Table 7: The signal to noise ratios for the open loop encoding and for the closed loop 
reconstructed signal, and the ratio of the latter to the former given in percent. Speech 
segments #3 and #4. 



seg#5 

seg#6 

bit# 

SNR 0 

SNR C 

% 

SNR 0 

SNR C 

% 

1 

14.4451 

2.5381 

17.5703 

16.0680 

2.5715 

16.0040 

2 

18.3325 

8.1861 

44.6533 

20.2947 

7.1206 

35.0862 

3 

21.6119 

12.3137 

56.9767 

23.8674 

10.1836 

42.6677 

4 

24.4126 

16.0508 

65.7480 

28.2908 

14.4147 

50.9517 

5 

26.6311 

17.6387 

66.2333 

32.5703 

18.4982 

56.7946 

' 6 

28.8033 

19.4940 

67.6798 

36.3803 

22.3223 

61.3581 

7 

31.0721 

21.1743 

68.1456 

39.9759 

26.1559 

65.4293 

8 

33.1476 

22.7050 

68.4966 

43.1481 

29.1693 

67.6027 

9 

35.0450 

' 24.2022 

69.2316 

45.5918 

31.8814 

69.9278 

10 

36.7098 

25.5872 

69.7013 

47.5695 

33.8777 

71.2172 


Table 8: The signal to noise ratios for the open loop encoding and for the closed loop 
reconstructed signal, and the ratio of the latter to the former given in percent. Speech 
segments #5 and #6. 
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seg# 7 

seg# 8 

bit # 

SNR 0 

SNR C 

% 

SNR 0 

SNR C 

% 

1 

13.9421 

2.6128 

18.7404 

11.0268 

1.4402 

13.0606 

2 

18.2153 

7.4177 

40.7222 

14.5319 

4.8650 

33.4780 

3 

22.9482 

11.7199 

51.0712 

18.2763 

7.4338 

40.6746 

4 

26.6642 

15.9383 

59.7741 

22.0020 

10.8409 

49.2725 

5 

30.9059 

19.6379 

63.5411 

24.9070 

14.1309 

56.7344 

6 

34.0348 

22.2772 

65.4542 

27.2988 

16.5717 

60.7049 

7 

36.9603 

24.8695 

67.2869 

29.5096 

18.5785 

62.9577 

8 

39.3135 ; 

26.8805 

68.3746 

31.4522 

20.6735 

65.7299 

9 

41.1298 

28.2075 

68.5818 

33.0869 

22.7978 

68.9027 

10 

42.3422 

29.1828 

68.9214 

34.9470 

24.9122 

71.2856 


Table 9: The signal to noise ratios for the open loop encoding and for the closed loop 
reconstructed signal, and the ratio of the latter to the former given in percent. Speech 
segments #7 and #8. 



SNRopi 

bit# 

seg# 1 

seg# 2 

seg# 3 

seg# 4 

seg# 5 

seg# 6 

seg# 7 

seg# 8 

1 

2.4679 

3.0863 

3.3356 

2.7293 

2.7171 

2.5986 

2.6743 

1.6407 

2 

7.5290 

8.0253 

6.6427 

7.3882 

8.2111 

7.3181 

7.5183 

5.0847 

3 

12.2880 

12.5018 

11.3505 

11.8651 

12.5419 

10.3847 

11.7578 

7.8394 

4 

15.7835 

16.0819 

16.7848 

16.2313 

16.2452 

15.0043 

15.9949 

11.4344 

5 

19.0602 

19.6372 

21.2640 

19.8404 

17.9537 

19.2594 

19.7424 

14.6990 

6 

21.8042 

22.6536 

26.6003 

23.9664 

19.8226 

23.0186 

22.3635 

17.3232 

7 

23.7572 

25.5528 

32.7834 

27.0377 

21.6392 

26.8645 

24.9577 

19.6836 

8 

25.1132 

27.8093 

37.5280 

30.1814 

23.1348 

29.6485 

27.0550 

21.5921 

9 

26.8069 

29.4403 

41.8666 

32.5186 

24.6874 

32.2450 

28.3679 

24.0594 

10 

28.4514 

31.1397 

46.8472 

34.0583 

25.9113 

34.1923 

29.4471 

25.8977 


Table 10: The SNR for all segments when the norm of the error in the closed loop recon- 
struction is minimized. 
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